Estimating the probability of failure for complex real-world systems using high-fidelity computational models is often prohibitively expensive, especially when the probability is small. Exploiting low-fidelity models can make this process more feasible, but merging information from multiple low-fidelity and high-fidelity models poses several challenges. This paper presents a robust multi-fidelity surrogate modeling strategy in which the multi-fidelity surrogate is assembled using an active learning strategy using an on-the-fly model adequacy assessment set within a subset simulation framework for efficient reliability analysis. The multi-fidelity surrogate is assembled by first applying a Gaussian process correction to each low-fidelity model and assigning a model probability based on the model's local predictive accuracy and cost. Three strategies are proposed to fuse these individual surrogates into an overall surrogate model based on model averaging and deterministic/stochastic model selection. The strategies also dictate which model evaluations are necessary. No assumptions are made about the relationships between low-fidelity models, while the high-fidelity model is assumed to be the most accurate and most computationally expensive model. Through two analytical and two numerical case studies, including a case study evaluating the failure probability of Tristructural isotropic-coated (TRISO) nuclear fuels, the algorithm is shown to be highly accurate while drastically reducing the number of high-fidelity model calls (and hence computational cost).
translated by 谷歌翻译
TRISTRUCCUCTIONATIOPIC(TRISO)涂层颗粒燃料是强大的核燃料,并确定其可靠性对于先进的核技术的成功至关重要。然而,Triso失效概率很小,相关的计算模型很昂贵。我们使用耦合的主动学习,多尺度建模和子集模拟来估计使用几个1D和2D模型的Triso燃料的故障概率。通过多尺度建模,我们用来自两个低保真(LF)模型的信息融合,取代了昂贵的高保真(HF)模型评估。对于1D TRISO模型,我们考虑了三种多倍性建模策略:仅克里格,Kriging LF预测加克里格校正,深神经网络(DNN)LF预测加克里格校正。虽然这些多尺度建模策略的结果令人满意地比较了从两个LF模型中使用信息融合的策略,但是通常常常称为HF模型。接下来,对于2D Triso模型,我们考虑了两个多倍性建模策略:DNN LF预测加克里格校正(数据驱动)和1D Triso LF预测加克里格校正(基于物理学)。正如所预期的那样,基于物理的策略一直需要对HF模型的最少的呼叫。然而,由于DNN预测是瞬时的,数据驱动的策略具有较低的整体模拟时间,并且1D Triso模型需要不可忽略的模拟时间。
translated by 谷歌翻译
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants$\unicode{x2014}$what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read as a physics of intelligence, and which inherits from the physics of self-organization. In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one's sensed world$\unicode{x2014}$also known as self-evidencing. Formally, this corresponds to maximizing (Bayesian) model evidence, via belief updating over several scales: i.e., inference, learning, and model selection. Operationally, this self-evidencing can be realized via (variational) message passing or belief propagation on a factor graph. Crucially, active inference foregrounds an existential imperative of intelligent systems; namely, curiosity or the resolution of uncertainty. This same imperative underwrites belief sharing in ensembles of agents, in which certain aspects (i.e., factors) of each agent's generative world model provide a common ground or frame of reference. Active inference plays a foundational role in this ecology of belief sharing$\unicode{x2014}$leading to a formal account of collective intelligence that rests on shared narratives and goals. We also consider the kinds of communication protocols that must be developed to enable such an ecosystem of intelligences and motivate the development of a shared hyper-spatial modeling language and transaction protocol, as a first$\unicode{x2014}$and key$\unicode{x2014}$step towards such an ecology.
translated by 谷歌翻译
We develop a Bayesian semi-parametric model for the estimating the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in the phase III AAML1031 clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT was not randomized in the trial, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semi-parametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time under a given rule. A g-computation procedure is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using this approach, we conduct posterior inference for the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.
translated by 谷歌翻译
The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building wider attention Transformers. We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch. The impact of changing the model aspect ratio on Transformers is then studied systematically. This ratio balances the number of layers and the number of attention heads per layer while keeping the total number of attention heads and all other hyperparameters constant. On average, across 4 NLP tasks and 10 attention types, single layer wide models perform 0.3% better than their deep counterparts. We show an in-depth evaluation and demonstrate how wide models require a far smaller memory footprint and can run faster on commodity hardware, in addition, these wider models are also more interpretable. For example, a single layer Transformer on the IMDb byte level text classification has 3.1x faster inference latency on a CPU than its equally accurate deeper counterpart, and is half the size. We therefore put forward wider and shallower models as a viable and desirable alternative for small models on NLP tasks, and as an important area of research for domains beyond this.
translated by 谷歌翻译
现有的球形卷积神经网络(CNN)框架在计算方面既可以扩展又是旋转等值的。连续的方法捕获旋转模棱两可,但通常在计算上是过时的。离散的方法提供了更有利的计算性能,但付出了损失。我们开发了一个混合离散(迪斯科)组卷积,该卷积同时均具有等效性,并且在计算上可扩展到高分辨率。虽然我们的框架可以应用于任何紧凑的组,但我们专注于球体。我们的迪斯科球形卷积不仅表现出$ \ text {so}(3)$ rotational equivariance,而且还表现出一种渐近$ \ text {so}(3)/\ text {so}(so}(so}(2)$ rotationation eporational ecorivarianciancience,对于许多应用程序(其中$ \ text {so}(n)$是特殊的正交组,代表$ n $ dimensions中的旋转)。通过稀疏的张量实现,我们可以在球体上的像素数量进行线性缩放,以供计算成本和内存使用情况。对于4K球形图像,与最有效的替代替代品量球卷积相比,我们意识到节省了$ 10^9 $的计算成本和$ 10^4 $的内存使用情况。我们将迪斯科球形CNN框架应用于球体上的许多基准密集预测问题,例如语义分割和深度估计,在所有这些问题上,我们都达到了最先进的性能。
translated by 谷歌翻译
在这项研究中,将放射学方法扩展到用于组织分类的光学荧光分子成像数据,称为“验光”。荧光分子成像正在出现在头颈部鳞状细胞癌(HNSCC)切除期间的精确手术引导。然而,肿瘤到正常的组织对比与靶分子表皮生长因子受体(EGFR)的异质表达的内在生理局限性混淆。验光学试图通过探测荧光传达的EGFR表达中的质地模式差异来改善肿瘤识别。从荧光图像样品中提取了总共1,472个标准化的验光特征。涉及支持矢量机分类器的监督机器学习管道接受了25个顶级功能的培训,这些功能由最小冗余最大相关标准选择。通过将切除组织的图像贴片分类为组织学确认的恶性肿瘤状态,将模型预测性能与荧光强度阈值方法进行了比较。与荧光强度阈值方法相比,验光方法在所有测试集样品中提供了一致的预测准确性(无剂量)(平均精度为89%vs. 81%; P = 0.0072)。改进的性能表明,将放射线学方法扩展到荧光分子成像数据为荧光引导手术中的癌症检测提供了有希望的图像分析技术。
translated by 谷歌翻译
分析运动表现或预防伤害需要捕获人体在某些运动中施加的地面反作用力(GRF)。标准实践在受控环境中使用与力板配对的物理标记,但这是由于高成本,冗长的实现时间和重复实验中的差异所破坏。因此,我们提出了视频中的GRF推论。尽管最近的工作使用LSTM从2D观点估算GRF,但它们的建模和表示能力可能受到限制。首先,我们建议使用变压器体系结构从视频任务中解决GRF,这是第一个这样做的。然后,我们引入了新的损失,以最大程度地减少回归曲线中的高影响峰。我们还表明,对2D到3D人类姿势估计的训练和多任务学习可以提高对看不见动作的概括。在此不同的任务上进行预训练时,在较小(稀有)GRF数据集上进行填充时,可以提供良好的初始权重。我们评估了Laas Parkour和新收集的钳子数据集;与先前的方法相比,我们出现的误差降低了19%。
translated by 谷歌翻译
在本文中,我们研究了部分可观察到的动态系统的在线增强学习(RL)。我们专注于预测状态表示(PSRS)模型,该模型是捕获其他知名模型(例如可观察到的马尔可夫决策过程(POMDP))的表达模型。 PSR使用一组未来观察结果的预测表示状态,并完全使用可观察的数量来定义。我们为PSRS开发了一种新型的基于模型的算法,该算法可以在样本复杂性中学习相对于系统的所有相关参数的多项式缩放的近乎最佳策略。我们的算法自然可以与功能近似合作,以扩展到具有较大状态和观察空间的系统。我们表明,给定一个可实现的模型类别,学习近乎最佳策略的样本复杂性仅相对于模型类的统计复杂性,而没有任何明确的多项式依赖性对状态和观察空间的大小依赖。值得注意的是,我们的工作是表明多项式样本复杂性与PSR中全球最佳政策竞争的第一项工作。最后,我们演示了如何直接使用我们的一般定理来得出特殊模型的样本复杂性界限,包括$ m $ $ step弱揭示和$ m $ $ $ - 可解码的表格pomdps,具有低率潜在过渡的POMDP和具有线性pomdps的POMDP排放和潜在过渡。
translated by 谷歌翻译
重要的理论工作已经确定,在特定的制度中,通过梯度下降训练的神经网络像内核方法一样行为。但是,在实践中,众所周知,神经网络非常优于其相关内核。在这项工作中,我们通过证明有一大批功能可以通过内核方法有效地学习,但是可以通过学习表示与相关的学习表示,可以轻松地学习这一差距。到目标任务。我们还证明了这些表示允许有效的转移学习,这在内核制度中是不可能的。具体而言,我们考虑学习多项式的问题,该问题仅取决于少数相关的方向,即$ f^\ star(x)= g(ux)$ withy $ u:\ r^d \ to \ r^r $ d \ gg r $。当$ f^\ star $的度数为$ p $时,众所周知,在内核制度中学习$ f^\ star $是必要的。我们的主要结果是,梯度下降学会了数据的表示,这仅取决于与$ f^\ star $相关的指示。这导致改进的样本复杂性为$ n \ asymp d^2 r + dr^p $。此外,在转移学习设置中,源和目标域中的数据分布共享相同的表示$ u $,但具有不同的多项式头部,我们表明,转移学习的流行启发式启发式启发式具有目标样本复杂性,独立于$ d $。
translated by 谷歌翻译